1. Scrivere una funzione che prenda in input due liste o due array 1-D di numeri reali aventi la stessa

lunghezza e restituisca la loro distanza euclidea. Suggerimento: si usi la vettorizzazione.

In [11]:
import numpy as np
import math

def distance(p1, p2):
    if len(p1) != len(p2):
        return 'errore'
    else:
        differenza = p1-p2
        quadrato = differenza**2
        somma = np.sum(quadrato)
        return math.sqrt(somma)

p1 = np.array([1, 2])
p2 = np.array([2, 1])
print(distance(p1, p2))
1.4142135623730951
Out[11]:
[<matplotlib.lines.Line2D at 0x11762dd00>]
No description has been provided for this image
  1. Importare il data set e visualizzare il tipo di ogni variabile.
In [12]:
import os
os.getcwd()
Out[12]:
'/Users/ludovicavargiu'
In [15]:
os.chdir('/Users/ludovicavargiu/Desktop/Laboratorio Python')
In [16]:
import pandas as pd

cars = pd.read_csv('cars.csv')
In [17]:
cars
Out[17]:
mpg cylinders displacement horsepower weight acceleration model origin car_name price
0 18.0 8 307.0 130 3504 12.0 70 USA chevrolet chevelle malibu 25561.59078
1 15.0 8 350.0 165 3693 11.5 70 USA buick skylark 320 24221.42273
2 18.0 8 318.0 150 3436 11.0 70 USA plymouth satellite 27240.84373
3 16.0 8 304.0 150 3433 12.0 70 USA amc rebel sst 33684.96888
4 17.0 8 302.0 140 3449 10.5 70 USA ford torino 20000.00000
... ... ... ... ... ... ... ... ... ... ...
387 27.0 4 140.0 86 2790 15.6 82 USA ford mustang gl 13432.50000
388 44.0 4 97.0 52 2130 24.6 82 Europe vw pickup 37000.00000
389 32.0 4 135.0 84 2295 11.6 82 USA dodge rampage 47800.00000
390 28.0 4 120.0 79 2625 18.6 82 USA ford ranger 46000.00000
391 31.0 4 119.0 82 2720 19.4 82 USA chevy s-10 9000.00000

392 rows × 10 columns

In [18]:
cars.dtypes
Out[18]:
mpg             float64
cylinders         int64
displacement    float64
horsepower        int64
weight            int64
acceleration    float64
model             int64
origin           object
car_name         object
price           float64
dtype: object
  1. Selezionare le prime 10 righe del dataset e le colonne ‘mpg’ e ‘acceleration’ utilizzando sia loc che

iloc.

In [57]:
iloc = cars.iloc[0:10, [0, 5]]
iloc
Out[57]:
mpg acceleration
0 18.0 12.0
1 15.0 11.5
2 18.0 11.0
3 16.0 12.0
4 17.0 10.5
5 15.0 10.0
6 14.0 9.0
7 14.0 8.5
8 14.0 10.0
9 15.0 8.5
In [58]:
loc = cars.loc[0:9, ['mpg', 'acceleration']]
loc
Out[58]:
mpg acceleration
0 18.0 12.0
1 15.0 11.5
2 18.0 11.0
3 16.0 12.0
4 17.0 10.5
5 15.0 10.0
6 14.0 9.0
7 14.0 8.5
8 14.0 10.0
9 15.0 8.5
In [21]:
cars[(cars['horsepower'] > 150) & (cars['acceleration'] < 12)]
Out[21]:
mpg cylinders displacement horsepower weight acceleration model origin car_name price
1 15.0 8 350.0 165 3693 11.5 70 USA buick skylark 320 24221.422730
5 15.0 8 429.0 198 4341 10.0 70 USA ford galaxie 500 30000.000000
6 14.0 8 454.0 220 4354 9.0 70 USA chevrolet impala 35764.334900
7 14.0 8 440.0 215 4312 8.5 70 USA plymouth fury iii 25899.465570
8 14.0 8 455.0 225 4425 10.0 70 USA pontiac catalina 32882.537140
9 15.0 8 390.0 190 3850 8.5 70 USA amc ambassador dpl 32617.059280
10 15.0 8 383.0 170 3563 10.0 70 USA dodge challenger se 30000.000000
11 14.0 8 340.0 160 3609 8.0 70 USA plymouth 'cuda 340 33034.922610
13 14.0 8 455.0 225 3086 10.0 70 USA buick estate wagon (sw) 26608.328420
38 14.0 8 400.0 175 4464 11.5 71 USA pontiac catalina brougham 33793.722840
41 12.0 8 383.0 180 4955 11.5 71 USA dodge monaco (sw) 23414.417100
66 11.0 8 429.0 208 4633 11.0 72 USA mercury marquis 16223.268340
89 12.0 8 429.0 198 4952 11.5 73 USA mercury marquis brougham 30000.000000
93 13.0 8 440.0 215 4735 11.0 73 USA chrysler new yorker brougham 40000.000000
94 12.0 8 455.0 225 4951 11.0 73 USA buick electra 225 custom 26260.634730
95 13.0 8 360.0 175 3821 11.0 73 USA amc ambassador brougham 40000.000000
115 16.0 8 400.0 230 4278 9.5 73 USA pontiac grand prix 27104.011820
123 11.0 8 350.0 180 3664 11.0 73 USA oldsmobile omega 9006.648949
154 16.0 8 400.0 170 4668 11.5 75 USA pontiac catalina 40000.000000
227 16.0 8 400.0 180 4220 11.1 77 USA pontiac grand prix lj 16330.963190
228 15.5 8 350.0 170 4165 11.4 77 USA chevrolet monte carlo landau 40000.000000
  1. Quante sono le auto di produzione giapponese? Rappresentare un grafico a barre con il numero di

auto per Paese di produzione.

In [22]:
cars['origin'].value_counts()
Out[22]:
origin
USA       245
Japan      79
Europe     68
Name: count, dtype: int64
In [23]:
import seaborn as sns

sns.catplot(data = cars, x = 'origin', kind = 'count')
Out[23]:
<seaborn.axisgrid.FacetGrid at 0x13b6c3ce0>
No description has been provided for this image
  1. Calcolare il prezzo medio delle auto raggruppando in base a ‘model’.
In [24]:
cars[['price', 'model']].groupby('model').mean()
Out[24]:
price
model
70 27464.198038
71 29766.801306
72 27407.591721
73 31395.407957
74 28714.623571
75 26359.907582
76 29185.822009
77 29012.154454
78 32967.374645
79 29340.843358
80 33269.261384
81 29000.531344
82 30438.724092
  1. Inserire una nuova variabile ‘kml’ nel dataset, ottenuta convertendo ‘mpg’ in km/l, usando il fat-

tore di conversione indicato nella descrizione delle variabili. Rappresentare la distribuzione di ‘kml’ condizionatamente al Paese di origine dell’auto tramite dei boxplot.

In [27]:
cars['kml'] = cars['mpg']*0.425170
cars
Out[27]:
mpg cylinders displacement horsepower weight acceleration model origin car_name price kml
0 18.0 8 307.0 130 3504 12.0 70 USA chevrolet chevelle malibu 25561.59078 7.65306
1 15.0 8 350.0 165 3693 11.5 70 USA buick skylark 320 24221.42273 6.37755
2 18.0 8 318.0 150 3436 11.0 70 USA plymouth satellite 27240.84373 7.65306
3 16.0 8 304.0 150 3433 12.0 70 USA amc rebel sst 33684.96888 6.80272
4 17.0 8 302.0 140 3449 10.5 70 USA ford torino 20000.00000 7.22789
... ... ... ... ... ... ... ... ... ... ... ...
387 27.0 4 140.0 86 2790 15.6 82 USA ford mustang gl 13432.50000 11.47959
388 44.0 4 97.0 52 2130 24.6 82 Europe vw pickup 37000.00000 18.70748
389 32.0 4 135.0 84 2295 11.6 82 USA dodge rampage 47800.00000 13.60544
390 28.0 4 120.0 79 2625 18.6 82 USA ford ranger 46000.00000 11.90476
391 31.0 4 119.0 82 2720 19.4 82 USA chevy s-10 9000.00000 13.18027

392 rows × 11 columns

In [28]:
sns.catplot(data = cars, x = 'origin', y = 'kml', kind = 'box')
Out[28]:
<seaborn.axisgrid.FacetGrid at 0x13bc47e90>
No description has been provided for this image
  1. Rappresentare le relazioni tra le variabili quantitative del data set utilizzando pairplot, colorando

i punti in base al numero di cilindri.

In [29]:
sns.pairplot(data = cars, hue = 'cylinders')
Out[29]:
<seaborn.axisgrid.PairGrid at 0x1381c12b0>
No description has been provided for this image
  1. Stimare un modello di regressione lineare tra ‘horsepower’ (variabile esplicativa) e ‘weight’ (variabile

risposta), includendo l’intercetta. I coefficienti ottenuti sono coerenti con lo scatterplot ottenuto nel punto precedente?

In [30]:
import statsmodels.api as sm

lm = sm.OLS(cars.weight, sm.add_constant(cars.horsepower))
res = lm.fit()
print(res.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 weight   R-squared:                       0.747
Model:                            OLS   Adj. R-squared:                  0.747
Method:                 Least Squares   F-statistic:                     1154.
Date:                Tue, 21 Jan 2025   Prob (F-statistic):          1.36e-118
Time:                        13:03:20   Log-Likelihood:                -2929.9
No. Observations:                 392   AIC:                             5864.
Df Residuals:                     390   BIC:                             5872.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        984.5003     62.514     15.748      0.000     861.593    1107.408
horsepower    19.0782      0.562     33.972      0.000      17.974      20.182
==============================================================================
Omnibus:                       11.785   Durbin-Watson:                   0.933
Prob(Omnibus):                  0.003   Jarque-Bera (JB):               21.895
Skew:                           0.109   Prob(JB):                     1.76e-05
Kurtosis:                       4.137   Cond. No.                         322.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

il coefficiente angolare è positivo, per tanto coerente con lo scatterplot precedente

In [31]:
sns.lmplot(x = 'horsepower', y = 'weight', data = cars, ci = None)
Out[31]:
<seaborn.axisgrid.FacetGrid at 0x13f647410>
No description has been provided for this image

Si consideri la successione f (n) = (1 + a)n definita per a ≥ −1 e n ≥ 0. Scrivere una funzione con due argomenti, a ed n, che restituisca l’ennesimo termine della successione e stampi un messaggio di errore se i vincoli su a ed n non sono rispettati. Ad esempio, se a = 5 e n = 2, f (2) = (1 + 5)2 = 36, mentre se a =−3 si dovrebbe ottenere un messaggio di errore. Si scriva la funzione sia in forma ricorsiva che in forma non ricorsiva.

In [34]:
def func(a, n):
    if a < -1 or n < 0:
        return 'errore'
    else:
        return (1+a)**n

print(func(-3, 4))
print(func(5, 2))
print(func(3, -2))
print(func(-3, -5))
errore
36
errore
errore
In [35]:
def func_r(a, n):
    if a < -1 or n < 0:
        return 'errore'
    if n == 0:
        return 1
    else:
        return (1+a) * func_r(a, n-1)

print(func(-3, 4))
print(func(5, 2))
print(func(3, -2))
print(func(-3, -5))
errore
36
errore
errore
In [36]:
import os
os.getcwd()
Out[36]:
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
In [38]:
import pandas as pd

phd = pd.read_csv('phd.csv')
phd
Out[38]:
articles gender married kids prestige mentor
0 0 male yes 0 2.52 7
1 0 female no 0 2.05 6
2 0 female no 0 3.75 6
3 0 male yes 1 1.18 3
4 0 female no 0 3.75 26
... ... ... ... ... ... ...
910 11 male yes 2 2.86 7
911 12 male yes 1 4.29 35
912 12 male yes 1 1.86 5
913 16 male yes 0 1.74 21
914 19 male yes 0 1.86 42

915 rows × 6 columns

In [39]:
phd.dtypes
Out[39]:
articles      int64
gender       object
married      object
kids          int64
prestige    float64
mentor        int64
dtype: object
  1. Selezionare dalla quinta alla dodicesima riga del dataset e le colonne ‘gender’ e ‘prestige’ utilizzando

sia loc che iloc.

In [59]:
iloc = phd.iloc[4:12, [1, 4]]
iloc
Out[59]:
gender prestige
4 female 3.750
5 female 3.590
6 female 3.190
7 male 2.960
8 male 4.620
9 female 1.250
10 male 2.960
11 female 0.755
In [62]:
loc = phd.loc[4:11, ['gender', 'prestige']]
loc
Out[62]:
gender prestige
4 female 3.750
5 female 3.590
6 female 3.190
7 male 2.960
8 male 4.620
9 female 1.250
10 male 2.960
11 female 0.755
  1. Selezionare le osservazioni con un numero di articoli maggiore di 5 e meno di due figli.
In [43]:
phd[(phd['articles'] > 5) & (phd['kids'] < 2)]
Out[43]:
articles gender married kids prestige mentor
877 6 male yes 1 4.62 8
878 6 female yes 0 2.10 36
880 6 male yes 0 4.34 9
881 6 female yes 0 4.29 24
883 6 male yes 1 2.96 13
884 6 male no 0 4.29 18
885 6 male no 0 3.40 14
886 6 female no 0 4.54 12
887 6 male yes 1 3.85 16
888 6 female no 0 3.15 9
889 6 female no 0 4.54 15
890 6 male no 0 3.47 6
891 6 female yes 0 4.29 1
892 6 male no 0 1.97 4
893 6 female no 0 3.32 6
894 7 male yes 0 3.59 1
895 7 male no 0 2.54 6
896 7 male no 0 3.41 20
897 7 male yes 1 1.97 0
898 7 female no 0 3.15 9
899 7 male no 0 4.62 15
900 7 male no 0 4.54 42
901 7 male yes 0 3.69 9
902 7 male no 0 4.34 19
903 7 male no 0 4.29 19
904 7 male yes 1 3.59 27
905 7 male no 0 3.69 19
906 8 male yes 0 2.51 11
907 9 male yes 1 2.96 23
908 9 male yes 1 1.86 47
909 10 female yes 0 3.59 18
911 12 male yes 1 4.29 35
912 12 male yes 1 1.86 5
913 16 male yes 0 1.74 21
914 19 male yes 0 1.86 42
  1. Quanti sono i dottorandi sposati?
In [44]:
phd['married'].value_counts()
Out[44]:
married
yes    606
no     309
Name: count, dtype: int64
  1. Qual è il numero medio di articoli pubblicati condizionatamente al numero di figli?
In [45]:
phd[['articles', 'kids']].groupby('kids').mean()
Out[45]:
articles
kids
0 1.721202
1 1.758974
2 1.542857
3 0.812500
  1. Rappresentare la distribuzione di ‘articles’ condizionatamente al numero di figli tramite dei boxplot.
In [46]:
import seaborn as sns

sns.catplot(data = phd, x = 'kids', y = 'articles', kind = 'box')
Out[46]:
<seaborn.axisgrid.FacetGrid at 0x13f780f80>
No description has been provided for this image
  1. Rappresentare l’istogramma di ‘prestige’, suddividendo il plot in due facet sulla base di ‘gender’
In [47]:
sns.displot(data = phd, x = 'prestige', col = 'gender') #metodo 1
Out[47]:
<seaborn.axisgrid.FacetGrid at 0x169f316a0>
No description has been provided for this image
In [54]:
g = sns.FacetGrid(phd, col = 'gender') #metodo 2
g.map(sns.histplot, 'prestige')
Out[54]:
<seaborn.axisgrid.FacetGrid at 0x16a9f4890>
No description has been provided for this image
  1. Stimare un modello di regressione lineare tra ’prestige’ (variabile esplicativa) e ‘mentor’ (variabile

risposta), includendo l’intercetta. Si rappresenti la retta di regressione.

In [55]:
import statsmodels.api as sm

lm = sm.OLS(phd.mentor, sm.add_constant(phd.prestige))
res = lm.fit()
print(res.summary())

sns.lmplot(x = 'prestige', y = 'mentor', data = phd, ci = None)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 mentor   R-squared:                       0.068
Model:                            OLS   Adj. R-squared:                  0.067
Method:                 Least Squares   F-statistic:                     66.42
Date:                Tue, 21 Jan 2025   Prob (F-statistic):           1.19e-15
Time:                        13:59:05   Log-Likelihood:                -3324.1
No. Observations:                 915   AIC:                             6652.
Df Residuals:                     913   BIC:                             6662.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.9807      1.002      0.978      0.328      -0.986       2.948
prestige       2.5093      0.308      8.150      0.000       1.905       3.114
==============================================================================
Omnibus:                      515.309   Durbin-Watson:                   1.763
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4360.948
Skew:                           2.472   Prob(JB):                         0.00
Kurtosis:                      12.484   Cond. No.                         11.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Out[55]:
<seaborn.axisgrid.FacetGrid at 0x16ac7d2b0>
No description has been provided for this image

Scrivere una funzione universale che prenda in input una lista di numeri reali e restituisca i valori ad essi corrispondenti secondo la funzione f (x) = x3 − x + 1. Creare una lista di numeri x e i corrispondenti valori y restituiti dalla funzione e rappresentarli come una linea continua, utilizzando matplotlib.

In [68]:
import numpy as np
import matplotlib.pyplot as plt

def f(x):
    return x**3-x+1

x = list(range(-10, 10))
y = np.frompyfunc(f, 1, 1)

plt.plot(x, y(x))
plt.title('grafico della funzione y = x^3-x+1')
plt.xlabel('asse delle x')
plt.ylabel('asse delle y')
print(f'valori in x: {x}')
print(f'valori in y: {y(x)}')
valori in x: [-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
valori in y: [-989 -719 -503 -335 -209 -119 -59 -23 -5 1 1 1 7 25 61 121 211 337 505
 721]
No description has been provided for this image
In [70]:
import os
os.getcwd()
Out[70]:
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
In [72]:
import pandas as pd

df = pd.read_csv('homes.csv')
df
Out[72]:
yearbuilt finsqft cooling bedroom fullbath halfbath lotsize totalvalue hsdistrict age condition fp
0 1754 1254 No Central Air 1 1 0 4.933 124300 Western Albemarle 265 Substandard 0
1 1968 1192 No Central Air 3 1 0 1.087 109200 Monticello 51 Substandard 0
2 1754 881 No Central Air 2 1 0 195.930 141600 Albemarle 265 Substandard 0
3 1934 480 No Central Air 0 0 0 10.000 69200 Western Albemarle 85 Substandard 0
4 1963 720 No Central Air 2 1 0 1.000 139700 Western Albemarle 56 Substandard 0
... ... ... ... ... ... ... ... ... ... ... ... ...
3015 1965 1140 No Central Air 3 1 0 0.490 222600 Monticello 54 Excellent 0
3016 1995 6963 Central Air 4 5 1 8.820 2746700 Western Albemarle 24 Excellent 1
3017 1885 1744 Central Air 3 2 0 4.160 333000 Monticello 134 Excellent 1
3018 1988 1638 Central Air 4 3 0 3.815 257900 Albemarle 31 Excellent 0
3019 1955 1659 Central Air 2 2 0 0.523 286300 Albemarle 64 Excellent 0

3020 rows × 12 columns

In [73]:
df.dtypes
Out[73]:
yearbuilt       int64
finsqft         int64
cooling        object
bedroom         int64
fullbath        int64
halfbath        int64
lotsize       float64
totalvalue      int64
hsdistrict     object
age             int64
condition      object
fp              int64
dtype: object
  1. Selezionare dalla settima alla quindicesima riga del dataset e le colonne ‘bedroom’ e ‘lotsize’ utiliz-

zando sia loc che iloc.

In [74]:
iloc = df.iloc[6:15, [3, 6]]
iloc
Out[74]:
bedroom lotsize
6 2 4.017
7 3 0.950
8 2 0.750
9 0 13.525
10 2 0.910
11 1 0.270
12 2 1.500
13 3 3.003
14 3 3.740
In [75]:
loc = df.loc[6:14, ['bedroom', 'lotsize']]
loc
Out[75]:
bedroom lotsize
6 2 4.017
7 3 0.950
8 2 0.750
9 0 13.525
10 2 0.910
11 1 0.270
12 2 1.500
13 3 3.003
14 3 3.740
  1. Selezionare le osservazioni il cui anno di costruzione `e uguale o successivo al 1960 e le cui dimensioni

siano inferiori a 800 piedi quadrati.

In [76]:
df[(df['yearbuilt'] >= 1960) & (df['finsqft'] < 800)]
Out[76]:
yearbuilt finsqft cooling bedroom fullbath halfbath lotsize totalvalue hsdistrict age condition fp
4 1963 720 No Central Air 2 1 0 1.000 139700 Western Albemarle 56 Substandard 0
20 1962 720 Central Air 3 1 0 0.500 61600 Western Albemarle 57 Poor 0
28 1968 672 No Central Air 2 1 0 1.500 99600 Monticello 51 Poor 0
265 1961 768 Central Air 2 1 0 1.464 148000 Monticello 58 Average 0
365 1979 748 Central Air 3 2 0 13.090 248900 Monticello 40 Average 0
430 1970 768 No Central Air 2 1 0 15.250 207700 Monticello 49 Average 0
432 1960 720 Central Air 3 2 0 2.971 185000 Albemarle 59 Average 0
475 1982 768 Central Air 2 1 0 2.604 171300 Western Albemarle 37 Average 0
487 1993 784 Central Air 2 1 0 11.500 207200 Western Albemarle 26 Average 0
673 1982 406 No Central Air 0 1 0 2.000 122700 Monticello 37 Average 1
849 1970 384 No Central Air 0 0 0 7.190 153000 Western Albemarle 49 Average 1
937 1983 796 Central Air 1 2 0 2.350 262700 Western Albemarle 36 Average 0
1007 1985 768 No Central Air 2 1 0 2.200 126100 Monticello 34 Average 0
1035 1970 480 No Central Air 2 1 0 1.040 66200 Monticello 49 Average 0
1533 2006 768 Central Air 2 1 0 0.350 157300 Western Albemarle 13 Average 0
1535 1994 640 No Central Air 2 1 0 1.882 90100 Monticello 25 Average 0
1619 1960 702 No Central Air 1 1 0 5.000 279300 Western Albemarle 59 Average 1
1709 2016 736 Central Air 1 1 0 0.029 203300 Western Albemarle 3 Average 0
2072 1966 640 No Central Air 2 1 0 1.650 152000 Albemarle 53 Average 0
2595 2006 786 No Central Air 3 2 0 5.000 172700 Western Albemarle 13 Good 0
  1. Quante sono le case in condizione eccellente? Rappresentare un grafico a barre con il numero di

immobili per condizione.

In [77]:
df['condition'].value_counts()
Out[77]:
condition
Average        2304
Good            507
Fair            133
Poor             32
Excellent        29
Substandard      15
Name: count, dtype: int64
In [78]:
import seaborn as sns

sns.catplot(data = df, x = 'condition', kind = 'count')
Out[78]:
<seaborn.axisgrid.FacetGrid at 0x16a862390>
No description has been provided for this image
  1. Calcolare la dimensione media degli immobili raggruppando in base al numero di camere da letto.
In [88]:
df[['bedroom', 'finsqft']].groupby('bedroom').mean()
Out[88]:
finsqft
bedroom
0 917.500000
1 1284.333333
2 1339.210356
3 1729.750000
4 2511.640914
5 3107.096774
6 3963.289474
7 4432.888889
8 6736.000000
  1. Inserire nel dataset una nuova variabile ‘price100’ ottenuta dividendo ‘totalvalue’ per 100 000. Rap-

presentare la distribuzione della nuova variabile condizionatamente al distretto tramite dei boxplot.

In [80]:
df['price100'] = df['totalvalue']/100000
df
Out[80]:
yearbuilt finsqft cooling bedroom fullbath halfbath lotsize totalvalue hsdistrict age condition fp price100
0 1754 1254 No Central Air 1 1 0 4.933 124300 Western Albemarle 265 Substandard 0 1.243
1 1968 1192 No Central Air 3 1 0 1.087 109200 Monticello 51 Substandard 0 1.092
2 1754 881 No Central Air 2 1 0 195.930 141600 Albemarle 265 Substandard 0 1.416
3 1934 480 No Central Air 0 0 0 10.000 69200 Western Albemarle 85 Substandard 0 0.692
4 1963 720 No Central Air 2 1 0 1.000 139700 Western Albemarle 56 Substandard 0 1.397
... ... ... ... ... ... ... ... ... ... ... ... ... ...
3015 1965 1140 No Central Air 3 1 0 0.490 222600 Monticello 54 Excellent 0 2.226
3016 1995 6963 Central Air 4 5 1 8.820 2746700 Western Albemarle 24 Excellent 1 27.467
3017 1885 1744 Central Air 3 2 0 4.160 333000 Monticello 134 Excellent 1 3.330
3018 1988 1638 Central Air 4 3 0 3.815 257900 Albemarle 31 Excellent 0 2.579
3019 1955 1659 Central Air 2 2 0 0.523 286300 Albemarle 64 Excellent 0 2.863

3020 rows × 13 columns

In [84]:
sns.catplot(data = df, x = 'hsdistrict', y = 'price100', kind = 'box')
Out[84]:
<seaborn.axisgrid.FacetGrid at 0x304f7ed20>
No description has been provided for this image
  1. Rappresentare le relazioni tra le variabili ‘finsqft’, ‘totalvalue’, ‘lotsize’, ‘age’ utilizzando pairplot,

colorando i punti in base al sistema di climatizzazione.

In [85]:
sns.pairplot(data = df, vars = ['finsqft', 'totalvalue', 'lotsize', 'age'], hue = 'cooling')
Out[85]:
<seaborn.axisgrid.PairGrid at 0x3027f20c0>
No description has been provided for this image
  1. Stimare un modello di regressione lineare tra ‘finsqft’ (variabile esplicativa) e ‘totalvalue’ (variabile

risposta), includendo l’intercetta. I coefficienti ottenuti sono coerenti con lo scatterplot ottenuto nel punto precedente?

In [86]:
import statsmodels.api as sm

lm = sm.OLS(df.totalvalue, sm.add_constant(df.finsqft))
res = lm.fit()
print(res.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:             totalvalue   R-squared:                       0.566
Model:                            OLS   Adj. R-squared:                  0.566
Method:                 Least Squares   F-statistic:                     3937.
Date:                Tue, 21 Jan 2025   Prob (F-statistic):               0.00
Time:                        14:53:08   Log-Likelihood:                -41719.
No. Observations:                3020   AIC:                         8.344e+04
Df Residuals:                    3018   BIC:                         8.345e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -1.662e+05   1.04e+04    -15.935      0.000   -1.87e+05   -1.46e+05
finsqft      288.2442      4.594     62.748      0.000     279.237     297.251
==============================================================================
Omnibus:                     4319.800   Durbin-Watson:                   1.995
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          2511875.784
Skew:                           8.117   Prob(JB):                         0.00
Kurtosis:                     143.351   Cond. No.                     5.38e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

il coefficiente angolare è positivo, coerente con lo scatterplot precedente

In [87]:
sns.lmplot(x = 'finsqft', y = 'totalvalue', data = df, ci = None)
Out[87]:
<seaborn.axisgrid.FacetGrid at 0x30565e870>
No description has been provided for this image

Scrivere una funzione per convertire una lunghezza espressa in iarde in una lunghezza espressa in metri e viceversa. La funzione deve prendere in input tre argomenti, ognuno definito tramite keyword: le prime due keyword sono “Misura iniziale” e “Misura finale” e ciascuna puo assumere unicamente i valori “iarde” e “metri”, mentre la terza keyworde “Lunghezza” e deve essere un numero reale corrispondente alla lunghezza da convertire. Quindi, ad esempio, se “Misura iniziale” = “iarde” e “Misura finale” = “metri” allora si dovra convertire la lunghezza specificata da iarde a metri. Se le prime due keyword sono diverse da “iarde” o “metri” e/o se la lunghezza inseritae negativa si deve restituire un errore. Si tenga conto che 1 iarda = 0.9144 metri.

In [124]:
def conv(*, misura_iniziale, misura_finale, lunghezza):
    if misura_iniziale.lower() not in ['metri', 'iarde'] or misura_finale.lower() not in ['metri', 'iarde'] or lunghezza < 0:
        return 'errore'
    if misura_iniziale.lower() == 'metri' and misura_finale.lower() == 'iarde':
        return lunghezza/0.9144
    elif misura_iniziale.lower() == 'iarde' and misura_finale.lower() == 'metri':
        return lunghezza*0.9144
    else:
        return 'non è necessaria alcuna conversione'

print(conv(misura_iniziale = 'iarde', misura_finale = 'metri', lunghezza = 12))
print(conv(misura_iniziale = 'metri', misura_finale = 'iarde', lunghezza = 12))
10.9728
13.123359580052494
In [95]:
import os
os.getcwd()
Out[95]:
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
In [96]:
os.chdir('/Users/ludovicavargiu/Downloads')

diabete = pd.read_csv('diabete.csv')
In [97]:
diabete
Out[97]:
chol stab.glu hdl ratio glyhb age gender height weight bp.1s bp.1d waist hip time.ppn
0 203 82 56 3.6 0 46 female 62 121 118 59 29 38 720
1 165 97 24 6.9 0 29 female 64 218 112 68 46 48 360
2 228 92 37 6.2 0 58 female 61 256 190 92 49 57 180
3 78 93 12 6.5 0 67 male 67 119 110 50 33 38 480
4 249 90 28 8.9 1 64 male 68 183 138 80 44 41 300
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
361 301 90 118 2.6 0 89 female 61 115 218 90 31 41 210
362 296 369 46 6.4 1 53 male 69 173 138 94 35 39 210
363 284 89 54 5.3 0 51 female 63 154 140 100 32 43 180
364 194 269 38 5.1 1 29 female 69 167 120 70 33 40 20
365 199 76 52 3.8 0 41 female 63 197 120 78 41 48 255

366 rows × 14 columns

In [98]:
diabete.dtypes
Out[98]:
chol          int64
stab.glu      int64
hdl           int64
ratio       float64
glyhb         int64
age           int64
gender       object
height        int64
weight        int64
bp.1s         int64
bp.1d         int64
waist         int64
hip           int64
time.ppn      int64
dtype: object
  1. Selezionare dalla decima alla venticinquesima riga del dataset e le colonne ‘hdl’ e ‘height’ utilizzando

sia loc che iloc.

In [100]:
iloc = diabete.iloc[9:25, [2, 7]]
iloc
Out[100]:
hdl height
9 54 65
10 34 58
11 36 60
12 30 69
13 47 65
14 38 65
15 64 67
16 36 64
17 41 65
18 50 67
19 76 67
20 43 69
21 41 62
22 45 61
23 92 72
24 30 68
In [101]:
loc = diabete.loc[9:24, ['hdl', 'height']]
loc
Out[101]:
hdl height
9 54 65
10 34 58
11 36 60
12 30 69
13 47 65
14 38 65
15 64 67
16 36 64
17 41 65
18 50 67
19 76 67
20 43 69
21 41 62
22 45 61
23 92 72
24 30 68
  1. Selezionare i soggetti di sesso femminile aventi più di 55 anni.
In [103]:
diabete[(diabete['gender'] == 'female') & (diabete['age'] > 55)]
Out[103]:
chol stab.glu hdl ratio glyhb age gender height weight bp.1s bp.1d waist hip time.ppn
2 228 92 37 6.2 0 58 female 61 256 190 92 49 57 180
9 242 82 54 4.5 0 60 female 65 156 130 90 39 45 300
17 196 206 41 4.8 1 62 female 65 196 178 90 46 51 540
21 281 92 41 6.9 0 66 female 62 185 158 88 48 44 285
30 182 85 37 4.9 0 61 female 69 174 176 86 49 43 330
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
338 194 95 36 5.4 0 63 female 58 210 140 100 44 53 240
346 162 90 46 3.5 0 60 female 63 121 110 64 32 34 300
353 279 270 40 7.0 1 60 female 68 224 174 90 48 50 180
357 221 126 48 4.6 0 59 female 62 177 130 78 39 45 60
361 301 90 118 2.6 0 89 female 61 115 218 90 31 41 210

61 rows × 14 columns

  1. Dopo aver selezionato i soggetti aventi una misura di ‘hip’ inferiore a 40, si calcoli la media di

‘stab.glu’ raggruppando per ‘hip’.

In [125]:
reduc = diabete.loc[diabete['hip'] < 40, ['hip', 'stab.glu']]
reduc.groupby('hip').mean()
Out[125]:
stab.glu
hip
30 106.000000
32 77.000000
33 90.571429
34 81.800000
35 77.800000
36 79.333333
37 96.307692
38 100.961538
39 104.441176
  1. Determinare il numero di uomini e donne, e rappresentare gli istogrammi dell’altezza in due facet,

suddividendo per genere.

In [116]:
diabete['gender'].value_counts()
Out[116]:
gender
female    214
male      152
Name: count, dtype: int64
In [117]:
sns.displot(data = diabete, x = 'height', col = 'gender')
Out[117]:
<seaborn.axisgrid.FacetGrid at 0x305ba5e50>
No description has been provided for this image
  1. Rappresentare la distribuzione del colesterolo tramite boxplot, condizionatamente a ‘gender’ e ‘glyhb’
In [126]:
sns.catplot(data = diabete, x = 'gender', y = 'chol', hue = 'glyhb', kind = 'box') 
Out[126]:
<seaborn.axisgrid.FacetGrid at 0x305b81dc0>
No description has been provided for this image
  1. Rappresentare le relazioni tra le variabili ‘ratio’, ‘weight’, ‘waist’ e ‘hip’ utilizzando pairplot, col-

orando i punti in base a ‘glyhb’.

In [119]:
sns.pairplot(data = diabete, vars = ['ratio', 'weight', 'waist', 'hip'], hue = 'glyhb')
Out[119]:
<seaborn.axisgrid.PairGrid at 0x305d57410>
No description has been provided for this image
  1. Stimare un modello di regressione lineare tra ‘weight’ (variabile esplicativa) e ‘waist’ (variabile

risposta), includendo l’intercetta. I coefficienti ottenuti sono coerenti con lo scatterplot ottenuto nel punto precedente?

In [120]:
lm = sm.OLS(diabete.waist, sm.add_constant(diabete.weight))
res = lm.fit()
print(res.summary())

sns.lmplot(x = 'weight', y = 'waist', data = diabete, ci = None)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  waist   R-squared:                       0.726
Model:                            OLS   Adj. R-squared:                  0.725
Method:                 Least Squares   F-statistic:                     963.4
Date:                Tue, 21 Jan 2025   Prob (F-statistic):          2.67e-104
Time:                        15:57:09   Log-Likelihood:                -926.02
No. Observations:                 366   AIC:                             1856.
Df Residuals:                     364   BIC:                             1864.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         16.2641      0.716     22.714      0.000      14.856      17.672
weight         0.1217      0.004     31.038      0.000       0.114       0.129
==============================================================================
Omnibus:                       18.991   Durbin-Watson:                   1.728
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               21.055
Skew:                           0.514   Prob(JB):                     2.68e-05
Kurtosis:                       3.570   Cond. No.                         822.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Out[120]:
<seaborn.axisgrid.FacetGrid at 0x306443fb0>
No description has been provided for this image

il coefficiente angolare è positivo, coerente con lo scatterplot precedente

In [128]:
cond = diabete.loc[diabete['hip'] < 40, ['hip', 'stab.glu']]
cond.groupby('hip').mean()
print(cond)
     hip  stab.glu
0     38        82
3     38        93
15    39       112
16    34        81
22    38        66
..   ...       ...
347   38       267
354   38        81
358   39        81
360   39        85
362   39       369

[103 rows x 2 columns]

Giulio Cesare usava un cifrario per inviare messaggi crittografati in modo che nessuno potesse decifrarli. La regola era associare ad ogni lettera dell’alfabeto la lettera che si trova tre posizioni successive. Ad esempio A diventa D, B diventa E, e cosı via. Scrivere una funzione che prenda in input una stringa e la converta usando il cifrario di Cesare, tenendo conto che l’alfabeto italiano estesoe composto da 26 lettere e che W corrisponde a Z, e dopo si ricomincia, quindi X corrisponde ad A, Y a B e Z a C. Ad esempio la stringa ‘ciao’ diventerebbe ‘fldr’. Non si consideri il caso di stringhe in cui ci sono numeri o altri simboli.

In [137]:
def crittografia(stringa):
    alfabeto = 'abcdefghijklmnopqrstuvwxyz'
    res = ''
    for char in stringa:
        res += alfabeto[(alfabeto.index(char)+3)%26]
    return res

print(crittografia('ciao'))
print(crittografia('yoga'))
fldr
brjd
In [139]:
df = pd.read_csv('diamonds.csv')
df
Out[139]:
carat cut color clarity depth table price x y z
0 1.19 Premium G VS1 61.2 55.0 7797 6.86 6.81 4.18
1 0.36 Very Good D VS2 62.4 58.0 780 4.48 4.53 2.81
2 0.30 Ideal E VVS2 61.8 56.0 789 4.31 4.33 2.67
3 0.38 Ideal E VVS2 61.6 55.0 1176 4.72 4.67 2.89
4 0.33 Ideal E VVS1 61.9 54.0 945 4.45 4.47 2.76
... ... ... ... ... ... ... ... ... ... ...
995 0.91 Ideal G SI1 60.8 58.0 4922 6.21 6.29 3.80
996 1.24 Good G VS2 61.3 57.0 8672 6.78 6.88 4.19
997 1.00 Good I VVS2 57.4 59.0 4032 6.61 6.53 3.77
998 1.53 Premium E SI1 60.8 60.0 12499 7.51 7.43 4.54
999 0.74 Ideal G SI1 61.3 57.0 3130 5.81 5.83 3.57

1000 rows × 10 columns

In [140]:
df.dtypes
Out[140]:
carat      float64
cut         object
color       object
clarity     object
depth      float64
table      float64
price        int64
x          float64
y          float64
z          float64
dtype: object
In [141]:
iloc = df.iloc[23:30, [1, 3]]
iloc
Out[141]:
cut clarity
23 Very Good VS2
24 Premium VS2
25 Premium VVS2
26 Ideal VVS1
27 Premium VS2
28 Ideal VS1
29 Ideal VS2
In [142]:
loc = df.loc[23:29, ['cut', 'depth']]
loc
Out[142]:
cut depth
23 Very Good 59.1
24 Premium 62.2
25 Premium 60.5
26 Ideal 61.1
27 Premium 58.8
28 Ideal 62.5
29 Ideal 62.9
In [143]:
df[(df['clarity'] == 'VS1') & (df['carat'] > 1)]
Out[143]:
carat cut color clarity depth table price x y z
0 1.19 Premium G VS1 61.2 55.0 7797 6.86 6.81 4.18
7 1.01 Fair I VS1 64.9 58.0 4263 6.17 6.22 4.02
14 1.51 Very Good E VS1 59.5 59.0 16129 7.48 7.54 4.47
107 1.09 Ideal G VS1 60.3 57.0 8305 6.68 6.73 4.04
113 1.04 Ideal G VS1 61.5 55.0 6831 6.54 6.55 4.02
118 1.01 Ideal G VS1 62.4 56.0 6066 6.37 6.42 3.99
136 1.21 Premium H VS1 62.3 58.0 7094 6.77 6.84 4.24
166 1.07 Premium G VS1 61.4 56.0 6076 6.65 6.61 4.07
190 1.35 Ideal G VS1 61.5 56.0 10378 7.15 7.12 4.39
191 1.55 Premium H VS1 62.6 58.0 11562 7.40 7.34 4.61
202 1.18 Ideal I VS1 62.2 55.0 6272 6.80 6.77 4.22
210 1.20 Good G VS1 63.6 58.0 8387 6.59 6.56 4.18
220 1.03 Ideal D VS1 61.5 57.0 8742 6.48 6.52 4.00
234 1.04 Ideal I VS1 62.9 43.0 4997 6.45 6.41 4.04
257 1.54 Ideal F VS1 60.3 57.0 18416 7.49 7.56 4.54
281 1.34 Very Good J VS1 61.7 59.0 6237 7.03 7.14 4.37
320 1.26 Ideal H VS1 61.5 59.0 7845 6.94 6.91 4.26
339 1.01 Premium D VS1 62.4 58.0 8265 6.38 6.41 3.99
347 1.14 Good I VS1 63.3 56.0 5056 6.60 6.68 4.20
403 1.27 Premium F VS1 60.3 58.0 10028 7.06 7.04 4.25
487 1.07 Very Good D VS1 59.9 55.0 9681 6.69 6.71 4.01
490 1.03 Premium H VS1 62.0 59.0 5523 6.45 6.48 4.01
508 1.51 Premium G VS1 59.5 59.0 14156 7.45 7.41 4.42
510 1.32 Ideal G VS1 62.4 53.0 10631 7.03 7.08 4.40
517 1.05 Ideal H VS1 61.8 55.0 6833 6.54 6.57 4.05
529 1.60 Premium J VS1 61.3 59.0 9032 7.52 7.49 4.60
550 1.23 Very Good F VS1 59.3 59.0 10609 6.98 7.01 4.15
560 1.51 Fair G VS1 64.9 55.0 11739 7.25 7.14 4.67
578 1.09 Ideal D VS1 62.3 56.0 9650 6.63 6.59 4.12
597 1.26 Premium F VS1 62.0 58.0 10669 6.95 6.88 4.29
605 1.01 Ideal G VS1 61.9 54.0 7179 6.44 6.48 4.00
661 1.01 Ideal G VS1 62.4 56.0 6672 6.39 6.44 4.00
712 1.20 Premium E VS1 60.7 57.0 10053 6.89 6.81 4.16
809 1.01 Very Good G VS1 63.6 57.0 6905 6.30 6.35 4.02
811 1.09 Premium J VS1 59.3 57.0 4303 6.79 6.74 4.01
830 1.08 Premium G VS1 62.0 60.0 6689 6.55 6.51 4.05
839 1.87 Ideal H VS1 59.7 60.0 17761 7.98 8.04 4.78
895 1.26 Ideal G VS1 62.3 57.0 6604 6.93 6.87 4.30
913 1.19 Ideal H VS1 62.1 54.0 7181 6.80 6.83 4.23
927 1.02 Good H VS1 59.8 63.0 5598 6.54 6.61 3.93
In [152]:
df['color'].value_counts()
Out[152]:
color
G    204
E    202
F    186
H    158
D    106
I    102
J     42
Name: count, dtype: int64
In [145]:
g = sns.FacetGrid(data = df, col = 'color', col_wrap = 3)
g.map(sns.histplot, 'depth')
Out[145]:
<seaborn.axisgrid.FacetGrid at 0x305e0fa70>
No description has been provided for this image
In [146]:
sns.catplot(data = df, x = 'cut', y = 'price', kind = 'box')
Out[146]:
<seaborn.axisgrid.FacetGrid at 0x30696db20>
No description has been provided for this image
In [148]:
sns.pairplot(data = df)
Out[148]:
<seaborn.axisgrid.PairGrid at 0x306b1a5a0>
No description has been provided for this image
In [149]:
lm = sm.OLS(df.price, sm.add_constant(df.carat))
res = lm.fit()
print(res.summary())

sns.lmplot(x = 'carat', y = 'price', data = df, ci = None)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  price   R-squared:                       0.853
Model:                            OLS   Adj. R-squared:                  0.853
Method:                 Least Squares   F-statistic:                     5795.
Date:                Tue, 21 Jan 2025   Prob (F-statistic):               0.00
Time:                        16:58:31   Log-Likelihood:                -8755.3
No. Observations:                1000   AIC:                         1.751e+04
Df Residuals:                     998   BIC:                         1.752e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -2194.9975     94.871    -23.137      0.000   -2381.168   -2008.827
carat       7662.8765    100.659     76.127      0.000    7465.350    7860.403
==============================================================================
Omnibus:                      268.582   Durbin-Watson:                   1.977
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1461.927
Skew:                           1.120   Prob(JB):                         0.00
Kurtosis:                       8.484   Cond. No.                         3.64
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Out[149]:
<seaborn.axisgrid.FacetGrid at 0x308e8b530>
No description has been provided for this image

il coefficiente angolare è positivo, coerente con lo scatterplot del punto precedente

In [ ]:
def dado(n):
    
    comb = np.zeros((6, 6), dtype = 'int')
    risultati = np.zeros((2, 11), dtype = 'int')
    
    for _ in range(n):
        
        dado1 = random.randint(1, 6)
        dado2 = random.randint(1, 6)

        comb[dado1 - 1, dado2 - 1] += 1

        somma = dado1 + dado2
        risultati[0, somma - 2] = somma
        risultati[1, somma - 2] += 1
        
    return comb, risultati

print(dado(20))
In [154]:
os.getcwd()
Out[154]:
'/Users/ludovicavargiu/Downloads'
In [155]:
os.chdir('/Users/ludovicavargiu/Desktop/Laboratorio Python')
In [156]:
data = pd.read_csv('myopia_study.csv')
In [157]:
data
Out[157]:
MYOPIC AGE GENDER SPHEQ AL ACD LT VCD SPORTHR READHR COMPHR STUDYHR TVHR MOMMY DADMY
0 1 6 1 -0.052 21.89 3.690 3.498 14.70 45 8 0 0 10 1 1
1 0 6 1 0.608 22.38 3.702 3.392 15.29 4 0 1 1 7 1 1
2 0 6 1 1.179 22.49 3.462 3.514 15.52 14 0 2 0 10 0 0
3 1 6 1 0.525 22.20 3.862 3.612 14.73 18 11 0 0 4 0 1
4 0 5 0 0.697 23.29 3.676 3.454 16.16 14 0 0 0 4 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
613 1 6 0 0.678 22.40 3.663 3.803 14.93 2 0 7 3 14 1 0
614 0 6 1 0.665 22.50 3.570 3.378 15.56 6 0 1 0 8 1 1
615 0 6 0 1.834 22.94 3.624 3.424 15.89 8 0 0 0 4 1 1
616 0 6 1 0.665 21.92 3.688 3.598 14.64 12 2 1 0 15 0 0
617 0 6 0 0.802 22.26 3.530 3.484 15.25 25 0 2 0 10 1 1

618 rows × 15 columns

In [158]:
data.dtypes
Out[158]:
MYOPIC       int64
AGE          int64
GENDER       int64
SPHEQ      float64
AL         float64
ACD        float64
LT         float64
VCD        float64
SPORTHR      int64
READHR       int64
COMPHR       int64
STUDYHR      int64
TVHR         int64
MOMMY        int64
DADMY        int64
dtype: object
In [159]:
iloc = data.iloc[2: 8, [3, 7]]
iloc
Out[159]:
SPHEQ VCD
2 1.179 15.52
3 0.525 14.73
4 0.697 16.16
5 1.744 15.36
6 0.683 15.49
7 1.272 15.08
In [160]:
loc = data.loc[2:7, ['SPHEQ', 'VCD']]
loc
Out[160]:
SPHEQ VCD
2 1.179 15.52
3 0.525 14.73
4 0.697 16.16
5 1.744 15.36
6 0.683 15.49
7 1.272 15.08
In [161]:
data[(data['AGE'] > 7) & (data['TVHR'] <= 10)]
Out[161]:
MYOPIC AGE GENDER SPHEQ AL ACD LT VCD SPORTHR READHR COMPHR STUDYHR TVHR MOMMY DADMY
71 0 9 0 0.118 24.24 3.678 3.533 17.03 7 0 0 4 4 1 1
72 0 8 1 0.672 22.99 3.798 3.822 15.37 26 9 1 10 7 0 1
112 0 8 0 0.588 22.59 3.492 3.322 15.77 9 2 1 2 4 1 1
147 0 9 1 1.368 23.75 4.030 2.960 16.76 9 2 1 5 7 0 0
165 0 8 1 1.012 22.41 4.048 3.460 14.90 16 0 0 3 9 0 0
185 0 8 1 0.542 21.77 3.650 3.554 14.57 21 5 3 5 9 0 1
227 0 8 1 1.031 22.33 3.660 3.364 15.30 4 3 0 3 6 1 1
230 0 8 1 0.625 23.24 3.930 3.294 16.01 14 5 7 3 9 1 0
233 0 8 0 0.112 23.69 3.997 3.500 16.20 9 4 4 3 7 0 0
240 0 8 0 0.442 22.29 3.417 3.547 15.33 14 4 1 3 9 0 1
251 0 8 0 0.882 22.53 3.382 3.514 15.64 16 2 3 3 9 0 1
253 0 8 1 0.048 22.39 3.483 3.640 15.26 22 8 0 3 8 0 0
260 0 8 1 0.044 22.28 3.386 3.308 15.58 18 3 0 2 9 0 0
273 0 8 1 0.904 23.31 3.630 3.460 16.22 7 4 0 7 4 0 1
294 0 8 1 0.680 22.38 3.838 3.568 14.97 9 7 4 10 6 1 0
303 1 9 0 -0.571 22.81 4.130 3.350 15.33 10 1 6 3 6 1 0
306 1 8 0 0.207 23.27 3.890 3.542 15.84 14 4 3 3 9 1 0
319 0 8 1 0.455 22.15 3.462 3.570 15.12 7 7 1 4 7 0 0
347 0 8 1 0.656 23.53 3.862 3.406 16.26 9 4 0 4 7 0 0
385 0 8 1 1.154 22.58 3.650 3.570 15.36 7 5 2 6 7 0 1
387 0 8 0 0.211 22.49 3.862 3.482 15.14 18 4 2 5 7 0 0
389 0 8 0 0.768 23.66 3.730 3.350 16.58 9 4 7 3 9 1 0
404 0 8 1 0.976 22.09 3.800 3.440 14.85 5 5 0 10 0 1 1
443 0 8 0 0.600 22.74 3.664 3.422 15.65 12 3 0 5 9 0 0
451 0 8 1 1.156 22.57 3.315 3.693 15.56 13 4 3 5 9 0 1
453 0 8 1 0.487 22.24 3.558 3.480 15.20 11 1 1 4 9 1 0
459 0 8 0 0.685 23.44 3.890 3.570 15.98 31 0 0 6 5 1 0
474 0 8 1 0.619 23.46 4.224 3.364 15.88 18 7 7 15 4 0 0
510 1 8 0 -0.339 22.94 3.634 3.640 15.66 22 7 7 3 9 0 1
514 1 8 0 0.269 23.11 3.680 3.333 16.10 5 4 5 3 7 1 1
521 0 8 0 0.781 23.88 4.028 3.388 16.46 9 2 4 3 0 1 1
548 0 8 0 0.307 23.14 4.020 3.210 15.91 9 9 0 7 2 0 1
552 0 8 1 0.725 21.84 3.502 3.454 14.89 0 3 5 5 5 1 0
593 0 8 1 3.731 22.22 3.200 3.420 15.60 15 3 5 12 10 0 0
611 1 8 0 -0.149 22.88 3.876 3.366 15.64 23 5 0 2 4 0 1
In [163]:
data[['AGE']].groupby('AGE').value_counts()
Out[163]:
AGE
5     21
6    456
7     82
8     53
9      6
Name: count, dtype: int64
In [164]:
sns.catplot(data, x = 'STUDYHR', col = 'GENDER', kind = 'count')
Out[164]:
<seaborn.axisgrid.FacetGrid at 0x30657dc70>
No description has been provided for this image
In [168]:
sns.catplot(data, x = 'MYOPIC', y = 'SPHEQ', hue = 'GENDER', kind = 'box')
Out[168]:
<seaborn.axisgrid.FacetGrid at 0x30d9ec800>
No description has been provided for this image
In [174]:
sns.pairplot(data, vars = data.loc[:, 'SPHEQ':'TVHR'])
Out[174]:
<seaborn.axisgrid.PairGrid at 0x306acfc80>
No description has been provided for this image
In [173]:
lm = sm.OLS(data.VCD, sm.add_constant(data.AL))
res = lm.fit()
print(res.summary())

sns.lmplot(x = 'AL', y = 'VCD', data = data, ci = None)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    VCD   R-squared:                       0.887
Model:                            OLS   Adj. R-squared:                  0.887
Method:                 Least Squares   F-statistic:                     4845.
Date:                Tue, 21 Jan 2025   Prob (F-statistic):          4.35e-294
Time:                        17:24:17   Log-Likelihood:                 50.775
No. Observations:                 618   AIC:                            -97.55
Df Residuals:                     616   BIC:                            -88.70
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -5.3161      0.297    -17.874      0.000      -5.900      -4.732
AL             0.9198      0.013     69.608      0.000       0.894       0.946
==============================================================================
Omnibus:                        0.154   Durbin-Watson:                   2.113
Prob(Omnibus):                  0.926   Jarque-Bera (JB):                0.250
Skew:                           0.010   Prob(JB):                        0.883
Kurtosis:                       2.904   Cond. No.                         747.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Out[173]:
<seaborn.axisgrid.FacetGrid at 0x30eecc9e0>
No description has been provided for this image

il coefficiente angolare è positivo, coerente con lo scatterplot precedente

Scrivere una funzione che prenda in input un numero intero positivo n ≥ 1 e restituisca la somma dei quadrati dei primi n interi positivin k=1 k2, un messaggio di errore se la condizione su n non `e soddisfatta. Scrivere la funzione sia in forma ricorsiva che non.

In [177]:
def func(n):
    somma = 0
    if n < 1:
        return 'errore'
    else:
        for i in range(1, n+1):
                somma += i**2
        return somma

print(func(5))
55
In [178]:
def func_r(n):
    if n < 1:
        return 'errore'
    elif n == 1:
        return 1
    else:
        return n**2 + func_r(n-1)

print(func(5))
55
In [179]:
df = pd.read_csv('nutri.csv')
df
Out[179]:
smoking gender age education weight height bmi
0 yes male 62 15+ 94.8 184.5 27.8
1 yes male 53 12-13 90.4 171.4 30.8
2 yes male 78 12-13 83.4 170.1 28.8
3 no female 56 15+ 109.8 160.9 42.4
4 no female 42 14-15 55.2 164.9 20.3
... ... ... ... ... ... ... ...
5390 yes female 76 12-13 59.1 165.8 21.5
5391 no male 26 15+ 112.1 182.2 33.8
5392 yes female 80 14-15 71.7 152.2 31.0
5393 yes male 35 <8 78.2 173.3 26.0
5394 no female 24 15+ 58.3 165.0 21.4

5395 rows × 7 columns

In [180]:
df.dtypes
Out[180]:
smoking       object
gender        object
age            int64
education     object
weight       float64
height       float64
bmi          float64
dtype: object
In [182]:
iloc = df.iloc[17:27, [1, 4]]
iloc
Out[182]:
gender weight
17 female 59.0
18 male 72.8
19 female 67.7
20 female 77.7
21 female 56.6
22 male 69.0
23 female 87.8
24 male 73.7
25 female 75.6
26 male 102.1
In [183]:
loc = df.loc[17:26, ['gender', 'weight']]
loc
Out[183]:
gender weight
17 female 59.0
18 male 72.8
19 female 67.7
20 female 77.7
21 female 56.6
22 male 69.0
23 female 87.8
24 male 73.7
25 female 75.6
26 male 102.1
In [184]:
df[(df['smoking'] == 'yes') & (df['height'] > 180)]
Out[184]:
smoking gender age education weight height bmi
0 yes male 62 15+ 94.8 184.5 27.8
30 yes male 56 <8 85.6 187.4 24.4
38 yes male 24 14-15 89.2 182.2 26.9
60 yes male 30 14-15 89.1 181.5 27.0
62 yes male 41 14-15 146.1 189.4 40.7
... ... ... ... ... ... ... ...
5277 yes male 29 14-15 60.1 183.0 17.9
5304 yes male 53 12-13 97.7 183.1 29.1
5325 yes female 55 14-15 123.0 181.5 37.3
5352 yes male 48 12-13 107.9 181.5 32.8
5361 yes male 33 8-11 63.0 180.3 19.4

236 rows × 7 columns

In [185]:
df[['smoking', 'gender', 'weight']].groupby(['gender', 'smoking']).mean()
Out[185]:
weight
gender smoking
female no 75.649094
yes 79.531342
male no 86.754267
yes 87.266423
In [186]:
sns.displot(data = df, x = 'height', col = 'gender')
Out[186]:
<seaborn.axisgrid.FacetGrid at 0x301b0fe90>
No description has been provided for this image
In [187]:
sns.catplot(data = df, x = 'education', y = 'bmi', kind = 'box')
Out[187]:
<seaborn.axisgrid.FacetGrid at 0x16a7d9e80>
No description has been provided for this image
In [188]:
sns.pairplot(df, hue = 'smoking')
Out[188]:
<seaborn.axisgrid.PairGrid at 0x304e6bf50>
No description has been provided for this image
In [189]:
lm = sm.OLS(df.bmi, sm.add_constant(df.weight))
res = lm.fit()
print(res.summary())

sns.lmplot(x = 'weight', y = 'bmi', data = df, ci = None)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    bmi   R-squared:                       0.778
Model:                            OLS   Adj. R-squared:                  0.778
Method:                 Least Squares   F-statistic:                 1.891e+04
Date:                Tue, 21 Jan 2025   Prob (F-statistic):               0.00
Time:                        17:48:12   Log-Likelihood:                -14152.
No. Observations:                5395   AIC:                         2.831e+04
Df Residuals:                    5393   BIC:                         2.832e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          6.1087      0.176     34.637      0.000       5.763       6.454
weight         0.2868      0.002    137.511      0.000       0.283       0.291
==============================================================================
Omnibus:                      287.794   Durbin-Watson:                   2.026
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              374.589
Skew:                           0.518   Prob(JB):                     4.56e-82
Kurtosis:                       3.771   Cond. No.                         329.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Out[189]:
<seaborn.axisgrid.FacetGrid at 0x305104f80>
No description has been provided for this image

il coefficiente è positivo coerente con lo scatterplot del punto precedente

Quando si scrive un codice, i tempi di esecuzione sono molto importanti e diversi linguaggi di program- mazione li restituiscono in secondi. Scrivere una funzione che prenda in input un intero positivo, che rappresenti il tempo di esecuzione di un programma espresso in secondi, e lo converta in ore, minuti e secondi. La funzione deve restituire un messaggio di errore nel caso in cui il numero inserito sia negativo. Ad esempio, se il numero in input fosse 7622, questo corrisponderebbe a 2 ore, 7 minuti e 2 secondi.

In [200]:
def time(n):
    if n < 0:
        return 'errore'
    ore = n//3600
    secondi_rimanenti = n%3600
    minuti = secondi_rimanenti//60
    secondi = n%60
    return print(f'{ore} ore, {minuti} minuti e {secondi} secondi')

time(7622)
2 ore, 7 minuti e 2 secondi
In [202]:
os.getcwd()
Out[202]:
'/Users/ludovicavargiu/Desktop/Laboratorio Python'
In [203]:
os.chdir('/Users/ludovicavargiu/Downloads')
In [204]:
ais = pd.read_csv('AIS.csv')
In [205]:
ais
Out[205]:
Sex Sport LBM Ht Wt BMI SSF RBC WBC HCT HGB Ferr PBF
0 F BBall 63.32 195.9 78.9 20.56 109.1 3.96 7.5 37.5 12.3 60 19.75
1 F BBall 58.55 189.7 74.4 20.67 102.8 4.41 8.3 38.2 12.7 68 21.30
2 F BBall 55.36 177.8 69.1 21.86 104.6 4.14 5.0 36.4 11.6 21 19.88
3 F BBall 57.18 185.0 74.9 21.88 126.4 4.11 5.3 37.3 12.6 69 23.66
4 F BBall 53.20 184.6 64.6 18.96 80.3 4.45 6.8 41.5 14.0 29 17.64
... ... ... ... ... ... ... ... ... ... ... ... ... ...
197 M WPolo 82.00 183.9 93.2 27.56 67.2 4.90 7.6 45.6 16.0 90 11.79
198 M Tennis 72.00 183.5 80.0 23.76 56.5 5.66 8.3 50.2 17.7 38 10.05
199 M Tennis 68.00 183.1 73.8 22.01 47.6 5.03 6.4 42.7 14.3 122 8.51
200 M Tennis 63.00 178.4 71.1 22.34 60.4 4.97 8.8 43.0 14.9 233 11.50
201 M Tennis 72.00 190.8 76.7 21.07 34.9 5.38 6.3 46.0 15.7 32 6.26

202 rows × 13 columns

In [206]:
ais.dtypes
Out[206]:
Sex       object
Sport     object
LBM      float64
Ht       float64
Wt       float64
BMI      float64
SSF      float64
RBC      float64
WBC      float64
HCT      float64
HGB      float64
Ferr       int64
PBF      float64
dtype: object
In [207]:
iloc = ais.iloc[8:16, [2, 9]]
iloc
Out[207]:
LBM HCT
8 54.57 41.1
9 53.42 41.6
10 68.53 41.4
11 61.85 43.8
12 48.32 41.4
13 66.24 41.0
14 57.92 43.7
15 56.52 40.3
In [208]:
loc = ais.loc[8:15, ['LBM', 'HCT']]
loc
Out[208]:
LBM HCT
8 54.57 41.1
9 53.42 41.6
10 68.53 41.4
11 61.85 43.8
12 48.32 41.4
13 66.24 41.0
14 57.92 43.7
15 56.52 40.3
In [210]:
ais[(ais['Sex'] == 'M') & (ais['BMI'] > 25)]
Out[210]:
Sex Sport LBM Ht Wt BMI SSF RBC WBC HCT HGB Ferr PBF
107 M Swim 78.0 184.0 85.00 25.11 52.3 4.75 8.6 45.5 15.2 99 8.54
109 M Swim 81.0 187.2 92.00 26.25 65.3 4.87 4.8 44.9 15.4 124 11.72
112 M Swim 91.0 190.4 96.90 26.73 35.2 4.32 4.3 41.6 14.0 177 6.46
114 M Rowing 75.0 181.8 85.40 25.84 61.8 5.04 7.1 44.0 14.8 64 12.61
117 M Rowing 78.0 186.0 86.80 25.09 60.2 4.78 9.3 43.0 14.7 150 10.05
119 M Rowing 79.0 185.6 87.20 25.31 44.5 5.22 8.4 47.5 16.2 89 9.36
121 M Rowing 82.0 185.6 89.80 26.07 44.7 5.40 6.8 49.5 17.3 183 8.61
122 M Rowing 82.0 189.0 91.10 25.50 64.9 4.92 5.4 46.2 15.8 84 9.53
124 M Rowing 83.0 185.6 92.30 26.79 58.3 5.09 10.1 44.9 14.8 118 9.79
125 M Rowing 88.0 194.6 97.00 25.61 52.8 4.83 5.0 43.8 15.1 61 8.97
126 M Rowing 83.0 189.0 89.50 25.06 43.1 5.22 6.0 46.6 15.7 72 7.49
132 M BBall 97.0 209.4 113.70 25.93 88.9 5.17 8.0 47.9 16.4 36 14.53
134 M BBall 90.0 198.7 100.20 25.38 61.8 4.50 9.2 40.7 13.7 72 10.64
144 M Field 88.0 185.1 102.70 29.97 71.1 5.09 8.9 46.3 15.4 44 13.97
145 M Field 83.0 185.5 94.25 27.39 65.9 5.11 9.6 48.2 16.7 103 11.66
157 M TSprnt 75.0 178.5 80.20 25.17 30.3 4.88 4.3 45.6 15.5 80 6.76
159 M Field 102.0 185.0 111.30 32.52 55.7 5.48 4.6 49.4 18.0 132 8.51
161 M Field 78.0 180.1 97.90 30.18 112.5 5.01 8.9 46.0 15.9 212 19.94
162 M Field 106.0 189.2 123.20 34.42 82.7 5.48 6.2 48.2 16.3 94 13.91
175 M TSprnt 86.0 189.1 94.80 26.51 52.8 5.50 6.4 48.1 16.5 40 9.40
177 M Field 89.0 179.1 108.20 33.73 113.5 4.96 8.3 45.3 15.7 141 17.41
178 M Field 80.0 180.1 97.90 30.18 96.9 5.01 8.9 46.0 15.9 212 18.08
181 M WPolo 77.0 192.7 94.20 25.37 96.3 4.63 9.1 42.1 14.4 126 18.72
184 M WPolo 71.0 182.7 86.20 25.82 100.7 5.34 10.0 46.8 16.2 94 17.24
188 M WPolo 85.0 192.6 93.50 25.21 47.8 5.01 9.8 46.5 15.8 97 8.87
191 M WPolo 86.0 193.9 101.00 26.86 75.6 5.08 8.5 46.3 15.6 117 14.98
193 M WPolo 79.0 185.3 87.30 25.43 49.5 4.63 14.3 44.8 15.0 133 8.97
195 M WPolo 82.0 184.6 94.70 27.79 75.7 5.34 6.2 49.8 17.2 143 13.49
197 M WPolo 82.0 183.9 93.20 27.56 67.2 4.90 7.6 45.6 16.0 90 11.79
In [211]:
ais[['Wt', 'Sport']].groupby('Sport').mean()
Out[211]:
Wt
Sport
BBall 79.776000
Field 89.971053
Gym 43.625000
Netball 69.593478
Rowing 78.537838
Swim 75.145455
T400m 64.046552
TSprnt 71.506667
Tennis 64.472727
WPolo 86.729412
In [213]:
sns.displot(ais, x = 'HGB', col = 'Sex')
Out[213]:
<seaborn.axisgrid.FacetGrid at 0x3027f3f50>
No description has been provided for this image
In [215]:
sns.pairplot(ais, vars = ais.iloc[:, 2: 7], hue = 'Sex')
Out[215]:
<seaborn.axisgrid.PairGrid at 0x3052b7f50>
No description has been provided for this image
In [216]:
lm = sm.OLS(ais.Wt, sm.add_constant(ais.Ht))
res = lm.fit()
print(res.summary())

sns.lmplot(x = 'Ht', y = 'Wt', data = ais, ci = None)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     Wt   R-squared:                       0.610
Model:                            OLS   Adj. R-squared:                  0.608
Method:                 Least Squares   F-statistic:                     312.6
Date:                Tue, 21 Jan 2025   Prob (F-statistic):           9.64e-43
Time:                        18:15:57   Log-Likelihood:                -723.08
No. Observations:                 202   AIC:                             1450.
Df Residuals:                     200   BIC:                             1457.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       -126.1890     11.397    -11.073      0.000    -148.662    -103.716
Ht             1.1171      0.063     17.680      0.000       0.993       1.242
==============================================================================
Omnibus:                       57.269   Durbin-Watson:                   1.580
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              136.811
Skew:                           1.266   Prob(JB):                     1.96e-30
Kurtosis:                       6.137   Cond. No.                     3.35e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.35e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Out[216]:
<seaborn.axisgrid.FacetGrid at 0x1750803b0>
No description has been provided for this image

il coefficiente angolare 1.1171 è positivo, coerente con lo scatterplot precedente

In [227]:
#sostituire nan a venti posizioni casuali della colonna Wt
c = ais.copy()['Wt']
pos = random.sample(list(range(0, 202)), k = 20)
c.loc[pos] = np.nan

c.info()
<class 'pandas.core.series.Series'>
RangeIndex: 202 entries, 0 to 201
Series name: Wt
Non-Null Count  Dtype  
--------------  -----  
182 non-null    float64
dtypes: float64(1)
memory usage: 1.7 KB
In [231]:
c2 = c.fillna(c.mean())
c2.info()
<class 'pandas.core.series.Series'>
RangeIndex: 202 entries, 0 to 201
Series name: Wt
Non-Null Count  Dtype  
--------------  -----  
202 non-null    float64
dtypes: float64(1)
memory usage: 1.7 KB
In [232]:
c2[pos]
Out[232]:
59     75.404396
168    75.404396
137    75.404396
65     75.404396
153    75.404396
34     75.404396
57     75.404396
86     75.404396
30     75.404396
149    75.404396
113    75.404396
28     75.404396
114    75.404396
73     75.404396
43     75.404396
55     75.404396
64     75.404396
7      75.404396
115    75.404396
54     75.404396
Name: Wt, dtype: float64
In [239]:
c2.to_csv('random.csv')
c2_csv = pd.read_csv('random.csv')
c2_csv
Out[239]:
Unnamed: 0 Wt
0 0 78.9
1 1 74.4
2 2 69.1
3 3 74.9
4 4 64.6
... ... ...
197 197 93.2
198 198 80.0
199 199 73.8
200 200 71.1
201 201 76.7

202 rows × 2 columns

In [241]:
print(c2.to_string())
0       78.900000
1       74.400000
2       69.100000
3       74.900000
4       64.600000
5       63.700000
6       75.200000
7       75.404396
8       66.500000
9       62.900000
10      96.300000
11      75.500000
12      63.000000
13      80.500000
14      71.300000
15      70.500000
16      73.200000
17      68.700000
18      80.500000
19      72.900000
20      74.500000
21      75.400000
22      69.500000
23      66.400000
24      79.700000
25      73.600000
26      78.700000
27      75.000000
28      75.404396
29      67.200000
30      75.404396
31      74.300000
32      78.100000
33      79.500000
34      75.404396
35      59.900000
36      63.000000
37      66.300000
38      60.700000
39      72.900000
40      67.900000
41      67.500000
42      74.100000
43      75.404396
44      68.800000
45      75.300000
46      67.400000
47      70.000000
48      74.000000
49      51.900000
50      74.100000
51      74.300000
52      77.800000
53      66.900000
54      75.404396
55      75.404396
56      64.100000
57      75.404396
58      64.800000
59      75.404396
60      72.100000
61      75.600000
62      71.400000
63      69.700000
64      75.404396
65      75.404396
66      60.000000
67      58.000000
68      64.700000
69      87.500000
70      78.900000
71      83.900000
72      82.800000
73      75.404396
74      94.800000
75      49.200000
76      61.900000
77      53.600000
78      63.700000
79      52.800000
80      65.200000
81      50.900000
82      57.300000
83      60.000000
84      60.100000
85      52.500000
86      75.404396
87      57.300000
88      59.600000
89      71.500000
90      69.700000
91      56.100000
92      61.100000
93      47.400000
94      56.000000
95      45.800000
96      47.800000
97      43.800000
98      37.800000
99      45.100000
100     67.000000
101     74.400000
102     79.300000
103     87.500000
104     83.500000
105     78.000000
106     78.000000
107     85.000000
108     84.700000
109     92.000000
110     72.300000
111     83.000000
112     96.900000
113     75.404396
114     75.404396
115     75.404396
116     93.500000
117     86.800000
118     87.900000
119     87.200000
120     53.800000
121     89.800000
122     91.100000
123     88.600000
124     92.300000
125     97.000000
126     89.500000
127     88.200000
128     92.200000
129     78.900000
130     90.300000
131     87.000000
132    113.700000
133     98.000000
134    100.200000
135     79.400000
136     90.300000
137     75.404396
138     83.900000
139     75.500000
140     60.600000
141     71.000000
142     71.800000
143     76.800000
144    102.700000
145     94.250000
146     79.000000
147     66.600000
148     71.800000
149     75.404396
150     68.200000
151     62.300000
152     61.000000
153     75.404396
154     57.400000
155     71.400000
156     70.300000
157     80.200000
158     84.200000
159    111.300000
160     80.700000
161     97.900000
162    123.200000
163     72.900000
164     83.000000
165     75.900000
166     70.700000
167     67.100000
168     75.404396
169     67.050000
170     70.500000
171     70.800000
172     71.000000
173     69.100000
174     62.900000
175     94.800000
176     94.600000
177    108.200000
178     97.900000
179     75.200000
180     74.800000
181     94.200000
182     76.100000
183     94.700000
184     86.200000
185     79.600000
186     85.300000
187     74.400000
188     93.500000
189     87.600000
190     85.400000
191    101.000000
192     74.900000
193     87.300000
194     90.000000
195     94.700000
196     76.300000
197     93.200000
198     80.000000
199     73.800000
200     71.100000
201     76.700000
In [247]:
ais['bmi_qcut'] = pd.qcut(ais['BMI'], q = 4)
In [248]:
ais
Out[248]:
Sex Sport LBM Ht Wt BMI SSF RBC WBC HCT HGB Ferr PBF bmi_qcut
0 F BBall 63.32 (186.175, 209.4] 78.9 20.56 109.1 3.96 7.5 37.5 12.3 60 19.75 (16.749, 21.082]
1 F BBall 58.55 (186.175, 209.4] 74.4 20.67 102.8 4.41 8.3 38.2 12.7 68 21.30 (16.749, 21.082]
2 F BBall 55.36 (174.0, 179.7] 69.1 21.86 104.6 4.14 5.0 36.4 11.6 21 19.88 (21.082, 22.72]
3 F BBall 57.18 (179.7, 186.175] 74.9 21.88 126.4 4.11 5.3 37.3 12.6 69 23.66 (21.082, 22.72]
4 F BBall 53.20 (179.7, 186.175] 64.6 18.96 80.3 4.45 6.8 41.5 14.0 29 17.64 (16.749, 21.082]
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
197 M WPolo 82.00 (179.7, 186.175] 93.2 27.56 67.2 4.90 7.6 45.6 16.0 90 11.79 (24.465, 34.42]
198 M Tennis 72.00 (179.7, 186.175] 80.0 23.76 56.5 5.66 8.3 50.2 17.7 38 10.05 (22.72, 24.465]
199 M Tennis 68.00 (179.7, 186.175] 73.8 22.01 47.6 5.03 6.4 42.7 14.3 122 8.51 (21.082, 22.72]
200 M Tennis 63.00 (174.0, 179.7] 71.1 22.34 60.4 4.97 8.8 43.0 14.9 233 11.50 (21.082, 22.72]
201 M Tennis 72.00 (186.175, 209.4] 76.7 21.07 34.9 5.38 6.3 46.0 15.7 32 6.26 (16.749, 21.082]

202 rows × 14 columns

In [ ]: